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This  report  reviews  selected  voice  recognition 
experiments  conducted  at  nps.  It  includes  a 
briel  description  ot  selected  experiments  and 
the  tindinys.  suggestions  tor  expansion  ot 
areas  ot  research  and  areas  in  which  imps  has 
not  pursued  research  are  indicated. 
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Rapid  technological  advances  in  Navy  weapons  have  resulted 
in  weapon  systems  with  substantial  increases  in  capability.  This 
technological  sophistication  has  produced  the  operational  situa¬ 
tion  wherein  the  intended  operator's  capability  to  respond  with 
the  necessary  speed  and/or  accuracy  is  frequently  precluded.  The 
human  operator  may  be  placed  in  the  difficult  situation  where  he 
is  incapable  of  responding  to  the  myriad  of  inputs  impinging  upon 
him.  This  inadequacy  on  the  part  of  operators  can  and  does 
result  in  weapon  systems  not  reaching  the  full  potential,  or 
worse  yet,  being  rendered  ineffective  by  operator  failure. 

The  nature  of  the  problem  is  frequently  one  of  a  failure  to 
match  man's  capabilities  and  limitations  with  machine  system 
requirements.  That  is,  machine  capabilities  have  expanded  signi¬ 
ficantly  as  a  result  of  technological  developments.  These 
developments  have  resulted  in  added  system  complexity  as  well  as 
expanded  capability.  However,  the  interface  between  man  and 
machine  has  remained  relatively  constant.  Man  interacts  with 
machine  using  the  same  technology  that  was  employed  with  much 
simpler  and  less  capable  hardware  systems.  Therefore,  while  the 
nature  of  the  equipment  has  advanced  rapidly  in  recent  years, 
man's  capabilities  and  the  devices  provided  for  man  to  interact 
with  machines  have  remained  essentially  unchanged.  Such  a 
situation  frequently  produces  an  environment  wherein  the  operator 
cannot  satisfy  his  function  and  thereby  contributes  to  system 
inability  to  meet  mission  needs.  Therefore,  it  the  full 
potential  of  complex  new  weapon  systems  is  to  be  realized, 


attention  must  be  devoted  to  designing  man-machine  interlaces 
which  consider  prospective  operators  in  terms  ot  system 
objectives . 

The  above  is  not  designed  to  suggest  that  innovative 
techniques  tor  man-machine  communications  have  not  been  explored, 
tor  example,  numerous  research  ettorts  (e.g.  Connolly,  iy?y;  Lea, 
ly«u ;  Lea  and  Shoup,  iy?y;  Poock,  1981);  etc.)  have  demonstrated 
that  speech  represents  a  viable  alternative  to  manual  entry  in 
man-machine  interaction.  These  ettorts  have  supported  the 
hypothesis  that  in  specitic  operational  environments  with  certair. 
task  types,  speech  may  be  more  effective  than  the  traditional 
manual  (e.g.  hands,  teet)  control  system.  In  tact,  Lea  (198U) 
and  Martin  and  welch  (198U)  suggest  that  speech,  as  a  result  ot 
the  frequency  and  intensity  ot  use,  is  man's  most  "natural"  and 
perhaps  universal  response  modality.  further,  in  situations 
where  speech  can  be  effectively  used  as  a  human  output-machine 
input  mechanism,  it  may  serve  to  tree  the  extremities  tor 
functions  incompatible  with  speech.  Such  a  situation  would  serve 
to  expand  human  operator  capability  and  thereby  enhance  the 
possibility  ot  the  human  element  to  function  successfully  in  an 
increasingly  complex  and  demanding  military  operational 
environment. 

based  on  the  above,  the  Naval  Postgraduate  School  began  to 
consider  the  use  ot  speech  as  a  machine  control  medium  in  the 
late  ly7u's.  The  present  etfort  represents  a  review  ot  some  ot 
the  major  research  efforts  conducted  at  NFS  and  suggestions  tor 
further  research  in  general  areas  ot  speech  recognition  and 


speech  as  an  input  modality  tor  machines. 


Overview  of  NFS  Voice  Recognition  Research 

Research  on  voice  recognition  at  the  Naval  Postgraduate 
School  has  attempted  to  examine  the  various  elements  which 
possess  the  potential  for  improving  on  overall  system 
performance.  The  elements  influencing  the  efficiency  with  which 
man  interacts  with  machine  include  the  following:  (Meister  1971) 

1)  Equipment-physical  characteristics 

2)  Environment-physical  surroundings 

3)  Task-nature  of  job(s)  performed 

4)  Personnel-capabilities,  limitations,  attitudes  and 
training 

Voice  recognition  research  at  NFS  has  attempted  to  examine 
the  variables  suggested  by  Meister  (ly7i)  in  order  to  gain  an 
appreciation  for  the  potential  effect  each  category  may  have  on 
voice  recognition  performance. 

In  addition,  it  must  be  recognized  that  the  arrangement  or 
organization  of  the  above  variable  categories  represents  a 
system.  Therefore,  in  addition  to  considering  individual 
parameters  it  is  essential  the  combination  of  elements  also  be 
considered.  To  accomplish  this  aspect,  the  NFS  research  program 
on  voice  recognition  included  research  efforts  directed  at  perfor 
mance  assessment  of  simulated  operational  environments  to  examine 
combinations  of  variable  categories. 

The  present  effort  represents  a  review  of  completed  voice 
recognition  research  efforts  conducted  at  NFb. 
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It  will  include  a 


brief  summary  of  each  effort  and  a  discussion  of  research  work., 
as  suyyested  by  current  findinys.  The  approach  taken  here  will 
beyin  with  a  discussion  of  student  studies  of  individual  elements 
followed  by  combination  of  efforts  where  multiple  elements  in  a 
simulated  operational  situation  were  i nves t iyated . 

The  voice  recoynition  system  used  in  the  majority  of  NPS 
voice  recoynition  experiements  consisted  of  model  T6U0  Threshold 
Technoloyy,  Inc.,  a  voice  recoynition  system.  This  system  is  a 
discrete  utterance  recoynizer.  (An  utterance  being  defined  as  a 
single  word  or  continuous  string  of  words  not  exceeding  two 
seconds  in  duration.)  The  T6UU  consists  of  a  noise  cancelling 
microphone,  analog  speech  preprocessor,  microcomputer,  CRT  ana 
keyboard  unit  and  a  magnetic  tape  cartridge  unit.  The  system 
operates  by  having  a  subject  establish  reference  speech  patterns 
for  a  specific  vocabulary  during  a  training  period.  Training 
consists  of  having  a  subject  "train"  the  voice  recognizer  by 
repeating  each  utterance  to  be  used  ten  times.  Training  provides 
a  basis  for  comparison  during  operation.  Training  can  be  accom¬ 
plished  in  less  than  ten  exposures,  however,  the  manufacturer 
has  suyyested  ten  repetitions  for  maximum  recoynition.  Following 
training,  an  operator  speaks  the  utterance  into  the  microphone 
followed  by  acceptance  of  the  utterance  by  the  speech  processor. 
The  speech  processor  extracts  speech  parameters  from  the  input  and 
converts  them  to  digital  signals  tor  processing  by  microcomputer. 
Rossible  responses  by  the  system  consist  ot  a  match  between  the 
spoken  utterance  and  the  stored  vocabulary;  a  misrecoym t ion  (i.e. 
systems  tailing  to  accurately  recognize  the  spoken  utterance); 


non-recoy ni t ion  in  which  the  system  responds  with  a  "beep". 

A  more  detailed  description  ot  the  TbUU  and  the  operational 
characteristics  can  be  obtained  in  Armstrong  (ly«U). 


Environment -physical  Surroundings 

In  any  situation  involving  speech  communication,  noise 
represents  a  potentially  disrupting  intluence.  This  potential  is 
particularly  important  in  situations  where  accurate  communication 


is  critical  (McCormick  and  Sanders,  ly«2).  Accuracy  ot  reception 


in  the  presence  ot  noise  is  an  essential  criteria  tor  any  input 
system  being  considered  by  the  military.  The  operational 
military  environment  is  frequently  characterized  by  high  noise 
levels  and  investigation  ot  the  potential  input  ot  noise  on  voice 
recognition  must  be  considered  ot  merit. 

Sister  (iy«U)  conducted  a  study  where  consideration  ot  the 
potential  input  ot  environmental  noise  on  speech  recognition  was 
investigated.  bolster's  experiment  used  the  '1’bUU  voice  recogni¬ 
tion  system  expanded  to  2bb  two  second  utterances.  In  sister's 
experiment,  however,  he  limited  the  investigation  to  bt) 


utterances . 


Independent  variables  consisted  ot  a)  noise  level  during 
training  of  the  system,  and  b)  noise  level  during  testing  ot  the 
system.  Noise  levels  tor  both  independent  variables  were 
identical  and  consisted  ot:  Ambient  noise  (averaye  ot  Jtf  dbA), 


con versat iona 1  noise  (average  of  bb  dBA),  and  a  second  conversa¬ 
tional  noise  level  (average  ot  7b  dbA).  In  all  three  noise 


conditions  deviation  trom  the  averaye  did  not  exceed  ±  7  owa. 


p 


Each  subject  in  the  study  trained  the  T6U0  under  one  ot  the 
above  described  noise  conditions  and  tested  in  each  ot  the  three 
noise  conditions,  six  subjects  being  randomly  assigned  to  each 
training  condition. 

Procedure  followed  required  each  subject  to  train  the  system 
on  the  selected  b(J  utterances.  Training  was  considered  accept¬ 
able  when  an  utterance  was  correctly  recoynized  two  out  of  three 
times.  Testiny  ot  the  system  required  each  subject  to  voice  each 
utterance  once  under  each  noise  condition.  An  error  was  scored 
if  the  voice  recognizer  responded  with  a  "beep",  the  wrony  word, 
or  produced  output  when  no  utterance  was  produced  by  the  subject. 
In  a  second  analysis  ot  the  data,  Elster  removed  the  variable  of 
"no  utterance"  leaving  only  "beeps"  ana  wrony  responses. 

However,  in  the  final  analysis  ot  variance,  removal  of  "no 
utterance"  did  not  influence  the  results. 

Overall  results  of  Elster's  effort  suggested  that  the 
presence  ot  noise  during  testiny  influenced  performance,  as 
measured  by  the  experiment.  Specifically,  Elster  observed  that  a 
conversational  noise  background  of  7s  dBA  produced  more  errors 
than  either  3b  dbA  or  bb  dBA.  However,  no  ditterence  was 
observed  between  3b  and  6b  dBA .  Lister  did  not  observe  a 
relationship  between  training  and  testiny  background  noise 
levels . 

This  observation  did  not  support  the  findinys  ot  Urennen 
(lybU)  as  reported  during  a  uuu  sponsored  conference  on  voice 
interactive  systems.  Urennen  reported  an  interaction  between 


b 


traininy  background  noise  ana  testing  background  noise  levels. 
Spec  i  t  ica  1  ly  ,  Lire  mien  observed  testing  performance  was  enhanced 
when  traininy  occurred  at  noise  levels  similar  to  what  was 
experienced  during  testing.  The  difference  between  t  Hidings  ot 
Sister  and  brennen  could  reside  in  the  tact  that  urennen's  noise 
levels  were  considerably  higher  in  intensity  (  iuu  ats  as  measured 
on  an  STL  meter)  than  those  empLoyed  by  bister.  in  tact  brennen 
is  ot  the  opinion  interaction  between  testing  ana  training  will 
not  be  observed  until  background  db  levels  approach  iuu  as. 

The  ettorts  ot  bister  were  significant  in  that  they  produced 
evidence  ot  potential  pertormance  degradation  ct  voice 
recognition  systems  under  consistent  background  noise  levels. 

This  work  needs  to  be  expanded  to  examine  more  extreme  levels  ol 
background  noise  and  ditterent  types  ot  noise  (i.e.  dir.terent 
noise  sources).  t or  example ,  it  may  be  possible  that  machinery 
noise  will  impact,  diff  erent  Ly  than  conversat  lonai  noise. 

Further,  the  difference  between  brennen  and  bister  on  the 
influence  ot  noise  during  training  on  subsequent  testing  should 
be  pursued. 

bister's  stuay  invest i gated  the  most  obvious  environmental 
intiuences  —  noise,  specif  ica L Ly  conversational  noise.  as 
suggested  above,  the  study  should  be  expanded  to  investigate 
other  noise  sources  (e.y.  impact  and  machinery  noises,  etc.)  as 
well  as  other  noise  parameters  and  their  effect  on  speech 
recognition  system  perform ance. 

However,  it  will  be  interesting  and  in  fact  necessary  before 
implementation  to  investigate  the  potential  impact  ot  other 


environmental  tactors.  b  or  example,  vibration  and  accelerat ion/ 
deceleration,  pressure  variation,  etc.  tound  in  many  military 
situations.  These  tactors  as  well  as  any  other  potential 
environmental  intluences  that  may  exist  in  the  work  station  need 
to  be  considered  prior  to  acceptance  or  rejection  ot  the  system. 

Therefore,  it  must  be  concluded  that  considerable  research 
on  environmental  tactors  and  their  intluence  on  system  pertor- 
mance  remain  to  be  done.  bister's  ettorts  represent  an  excellent 
beyinniny  in  tne  area  ot  constant  noise  but  should  be  tollowed 
with  additional  studies  in  the  total  area  ot  environment. 

Task  Variables 

Another  area  ot  interest  is  the  nature  ot  the  task  to  be 
performed.  The  Naval  Postgraduate  School  research  program  has 
devoted  considerable  attention  to  the  intluence  task  difference 
may  exercise  on  overall  system  performance.  These  efforts  have 
concentrated  on  attempting  to  simulate  operat  iona  1  t  ^  L.t-  t  a  s  k  s  . 

Jay  ( 1 y a 1 )  considered  speech  recognition  as  a  means  ot 
improving  speed  and  reliability  in  the  intelligence  community 
task  ot  imagery  interpretation.  Military  imagery  interpretat ion 
is  essentially  the  analysis  ot  a  display  in  terms  ot  “what", 
"who",  "when"  and  "where".  it  is  oesiyned  to  aid  the  commander 
in  the  decision  making  process  and  is  a  major  tactor  in  command 
and  control.  Current  systems  provide  tor  man-machine 
communication  through  the  use  ot  a  keyboard.  Jay  investigated 
the  possibility  oi  improving  imagery  interpretation  by  improving 


the  man-machine  interface.  It  was  the  opinion  of  .Jay,  that 
improving  the  interaction  would  result  in  improved  use  of  man's 
skills  by  allowing  the  operator  to  concentrate  on  image  analysis 
as  opposed  to  concentrating  on  inputting  information  into  the 
system  via  a  keyboard. 

The  research  effort  was  designed  to  determine  whether  or  not 
a  currently  available  voice  recognition  system  could  be  employed 
for  reporting  imagery  derived  from  intelligence  information  using 
an  interactive  computer  system.  The  question  to  be  answered 
involved  a  determination  of  any  significant  differences  in 
speed,  accuracy,  efficiency  and  subject  attitudes  regarding 
manual  keyboard  input  methods  versus  voice  entry. 

Equipment  employed  in  the  study  included  the  T6UL)  Voice 
recognizer  and  a  G  1130  Harvard  Tachistoscope  for  simulating 
the  optics  portion  of  an  imagery  interpreter's  task.  The 
Tachistoscope  is  a  sophisticated  instrument  which  provides  a 
mechanism  for  the  presentation  of  visual  information.  It  can  be 
programmed  to  present  or  change  stimuli  at  specific  intervals  or 
allow  the  viewer  to  change  presented  information  at  will.  In 
Jay's  experiment  the  Tachistoscope  was  used  to  simulate  the 
visual  task  of  imagery  analysis  and  subsequent  reporting. 
Thirty-six  stimulus  cards  were  used  in  the  experiment.  Content 
for  the  cards  was  judged  on  the  basis  of  realism,  even  mix  of 
ground,  air  and  naval  terms,  use  of  USbK/warsaw  Pact  vocabulary, 
and  maintenance  of  a  balance  in  number  of  characters  in  sets  of 
stimuli . 

The  T600  used  in  the  experiment  had  an  expanded  memory 
providing  for  2b6  discrete  utterances.  In  addition,  two 


recognition  modes  were  used  -  buttereo  and  unbuttered.  In  the 
unbuttered  mode,  the  system  outputs  to  the  computer  immediately 
toilowing  voice  input.  In  the  buttered  mode,  up  to  12b  utterance 
output  strings  could  be  stored  sequentially  in  the  butter  tor 
subsequent  output  as  a  block  ot  characters.  Vocabulary  used 
consisted  ot  2bb  utternaces.  Included  were  the  phonetic 
alphabet,  numbers  U  -  2b,  administrative  a lphanumer 1 cs ,  special 
symbols  and  control  characters,  and  air,  ground  and  naval  forces 
equipment  vocabulary.  In  Jay's  ettort  no  attempt  was  made  to 
limit  the  comparison  set  for  an  utterance.  Rather  the  entire 
vocabular  of  2bb  utterances  was  used  tor  each  spoken  utterance. 

Manual  entry  ot  intormation  was  accomplished  by  means  ot  a 
keyboard.  Subjects  were  given  a  typing  test  prior  to 
part icipating  in  the  actual  experiment.  based  on  the  results  ot 
the  test,  subjects  were  classified  as  either  "fast"  or  "slow", 
slow  typists  scores  ranged  from  l7  to  32  words  per  minute  (wpm) 
with  an  average  of  2b  wpm.  hast  typists  scores  ranged  t  roin  S3  to 
bb  wpm  with  an  average  ot  4J  wpm. 

Included  in  the  effort  ot  Jay  was  consideration  of  inter¬ 
active  text  editing.  This  feature  was  provided  by  facilities  at 
ARPANET .  Host  computers  in  ARPANET  were  used  to  conduct  and  man¬ 
age  experimental  as  well  as  the  interactive  computer  environment. 

Subjects  consisted  of  21)  volunteers.  ut  the  twenty, 
eighteen  were  military  and  two  civilian.  Nineteen  ot  the 
subjects  were  male  and  one  was  female.  Most  subjects  (lb)  had 
observed  a  demonstration  ot  the  voice  system.  Twelve  had 


actually  used  the  system  in  one  capacity  or  another  and  li  had 
researched  voice  tor  a  report. 

For  purposes  ot  the  experiment,  all  subjects  individually 
trained  the  system  with  the  2bf>  word  vocabulary.  training 
included  orientation  on  proper  methods  in  system  training  prior 
to  enteriny  utterances  into  memory.  Following  training  ot  the 
system  each  utterance  was  repeated  three  times.  Criterion  tor 
considering  the  system  "trained"  was  correct  two  out  ot  three 
training  trials.  Any  utterance  not  meeting  criterion  was  retrained. 

Jay  identified  attitudes  toward  use  ot  voice  in  a  situation 
normally  involving  manual  entry  as  an  important  variable  worthy 
of  consideration.  A  questionnaire  was  developed  to  assess 
attitudes  of  subjects  regarding  voice  entr,  vs  manual  entry. 
(Questions  probed  attitudes  of  subjects  relative  to  acccuracy, 
speed,  training,  flexibility,  etc.  The  questionnaire  was 
administered  betcre  and  after  actual  testing. 

The  results  ot  Jay's  efforts  were  impressive  in  supporting 
the  possibility  ot  using  voice  entry  in  the  work  environment  and 
task  type  investigated.  Jay  observed  that  in  reporting  speed  a 
highly  siymt  icant  difference  ( F  <  .ouuu)  existed  between  ex¬ 
plored  entry  modes.  Jay  observed  that  on  the  average,  unbuttered 
voice  condition  was  41%  and  butfered-voice  bd >  taster  than  typed 
data  entry.  The  author  postulated  that  voice  data  entry  allowed 
subjects  to  compose  a  report  while  simultaneously  receiving 
information.  This  condition  did  not  exist  with  manual  entry. 

Learning  over  trials  was  observed  in  all  data  entry  modes. 

It  was,  however,  interesting  that  no  significant  difference  was 


observed  between  tast  typists  and  slow  typists.  The  lack  ot  a 
significant  ditterence  as  a  result  ot  previous  experience  and 
competence  as  a  typist  may  suggest  that  the  typing  task  is 
sutt lciently  different  from  the  task  ot  interest  here  to  render 
previous  experience  and  capability  ot  little  value. 

In  the  real  world  intelligence  environment  reporting 
accuracy  would  be  as  important  it  not  more  important  than 
reporting  speed.  In  terms  ot  reporting  accuracy,  tr  .  ig nit  leant 
ditterence  existed  between  the  conditions  investigate.;.  tast 
typists,  slow  typists,  and  entry  modes  were  not  styut  leant  ly 
dit  terent . 

The  third  variable  considered  involved  efficiency  ot 
reporting.  In  terms  ot  efficiency,  typing  was  considered  the 
most  efficient,  (yb%)  buttered  voice  entry  next  lev*)  ana 
unbuffered  voice  least  etticient  (tfUi). 

Jay  suggested  that  efficiency  differences  may  be  the  result 
ot  differences  in  exposure  with  the  entry  modes  examined.  That 
is,  as  a  result  ot  rather  extensive  exposure  to  keyboards  prior 
to  the  experiment  and  limited  voice  entry  exposure,  keyboard  was 
superior.  This  conclusion  does  appear  to  conflict  with  the 
earlier  suggestion  relative  to  the  relationship  ot  exposure  and 
manual  entry  performance.  This  hypothesis  deserves  further  study 
as  it  was  not  totally  supported  by  the  results  ot  other  aspects 
ot  the  study.  That  is,  observations  with  other  variables  (e.y. 
speed,  accuracy)  do  not  necessarily  support  Jay's  conclusion. 
Additional  exploration  ot  the  experience  question  would  be  ot 


great  merit.  It  certainly  suggests  the  neeu  to  a 1 1 e mp t  to  equate 
experience  levels  in  future  studies. 

In  operational  settings  the  question  of  voice  entry  system 
accuracy  over  time  coulo  certainly  he  an  important  consideration. 
The  prospect  of  time-on-task  or  just  eiupseu  time  impacting  on 
speaker  performance  would  seem  a  reasonable  assumption.  Results, 
however,  suggested  that  time  did  not  degrade  voice  recognition 
system  performance.  The  Tbuu  performance  in  recognition  accuracy 
over  trials  was  97 1  it  only  voice  recognition  errors  were 
considered  and  95.5%  it  rejects  were  involved. 

Subject  Att itudes 

As  suggested  earlier,  Jay  correctly  identified  user 
attitudes  as  a  potentially  significant  factor  in  overall  system 
performance.  As  a  result  of  the  questionnaire  administered  to 
determine  subject  attitudes.  Jay  considered  subject  attitudes  to 
be  generally  positive  relative  to  use  of  voice,  ot  particular 
interest  was  subject  evaluation  ot  speech  entr2  following  the 
experiment.  Opinions  expressed  by  participants  were  more 
positive  at  the  conclusion  ot  the  effort  than  prior  to  commence¬ 
ment.  There  is  no  question  that  it  is  a  significant  advantage 
when  potential  users  accept  a  proposed  system.  Rejection  can 
lead  to  inefficiency  and  overall  degration  of  system  performance 
which  can  be  eliminated  only  with  extensive  training  and 
considerable  experience.  However,  operator  acceptance  is  not 
sufficient  reason  tor  acceptance  ot  a  system.  For  example,  most 


subjects  express  a  definite  preference  tor  color  displays  and 
estimate  their  performance  as  beiny  superior  with  the  color  dis¬ 
play  when  compared  with  black  and  white.  Research  evidence  does 
not  always  support  the  opinion  of  the  subject  in  that  in  some 
tasks  while  perterence  may  be  tor  color,  performance  favors  black 
and  white.  The  point  beiny  that  preference  alone  may  not 
necessarily  mean  performance  will  be  enhanced.  This  should  not 
be  construed,  however,  to  minimize  the  siyniticance  of  operator 
acceptance . 

In  summary,  it  must  be  admitted  that  Jay's  study  did  not 
provide  evidence  of  a  clear  superiority  lor  manual  or  voice 
entry.  The  results  were  mixed  indicatiny  advantages  in  specitic 
situations  tor  each  entry  mode.  Such  a  t  main j  is  not  unique  and 
suggests  the  need  tor  additional  research  to  identity  those  tasks 
where  voice  entry  can  provide  tor  performance  enhancement  and/or 
identify  tasks  which  seemingly  are  not  well  suited  to  voice 
entry . 

McSorley  (1981)  also  examined  the  potential  of  voice 
recognition  in  an  applied  settiny.  The  author's  objective  was  to 
examine  the  possibility  of  operating  a  Warfare  Environmental 
bimulator  (WLS)  by  voice  entry  rather  than  the  traditional  manual 
method.  The  procedure  utilized  involved  entry  commands  via 
voice  or  manual  entry  and  subsequent  evaluation  ot  the  two  entry 
methods . 

The  WhS  waryame  is  computer  assisted  and  consists  ot  a  two- 
sided  interactive  process  in  which  two  sides  (blue  and  orange) 
can  define,  structure  and  control  own  forces.  It  is  a  naval 
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war  yame  exclusively  and  involves  80  commands  tor  control  ot 
platform  and  sensors  involved  in  the  yame.  Commands  used  by 
players  are  hiyhly  formatted  in  terms  of  syntax  and  input  para¬ 


meters.  Input  errors  can  consist  ot  incorrect  syntax  or  an 
impossible  action.  Syntax  errors  result  in  immediate  notifica¬ 
tion  which  can  be  corrected  immediately.  Impossible  action 
errors  do  not  result  in  immediate  warning.  Notification  ot 
impossible  action  is  not  displayed  until  execution  is  attempted. 

WtiS  requires  that  a  specific  scenario  be  selected. 

Scenarios  employed  by  McSorley  consisted  of  the  CUBA  scenario 
because  of  its  simplicity  and  adequate  forces  to  meet  objectives 
of  the  effort.  In  the  scenario  U.S.  vessels  consist  ot  the 
aircraft  carrier  enterprise,  yuided  missile  destroyer  Berkley  and 
the  nuclear  submarine  Sturgeon.  Opposition  forces  consisted  of 
three  soviet  warships  and  a  merchant  vessel  in  a  situation 
similar  to  the  I9b2  Cuban  missile  crisis. 

Equipment  consisted  of  the  Threshold  T8U0  voice  recognition 
unit,  an  ADM  J1  Data  Display  Terminal  and  a  Miniterm  Model 
system.  The  TfaUU  was  used  tor  voice  entry,  the  ADM  JL  tor  manual 
entry  and  the  Miniterm  was  used  to  provide  a  hard  copy  printout 
ot  input  commands  tor  scoring  performance. 

Subjects 

Twelve  volunteer  subjects  participated  in  the  study.  Eleven 
of  the  subjects  were  male  military  officers  and  one  was  a  female 
civilian  member  ot  the  NPS  faculty.  Subjects  had  varying  levels 
ot  experience  with  WES,  with  the  faculty  member  being  quite 
experienced  with  the  wargame.  Familiarity  with  the  voice 


recognition  system  also  varied  t rom  experienced  to  inexperienced 
with  six  being  assigned  to  each  category. 


Training  involved  use  ot  the  Wh)S  vocabulary  which  consisted 
of  1  b2  utterances.  As  in  the  previous  efforts,  once  the  voice 
recognition  system  was  trained,  each  vocabulary  utterance  was 
repeated  three  times.  Utterances  correctly  identified  two  ot  the 
three  times  were  considered  "trained".  Utterances  failing  to 
meet  the  criterion  were  retrained. 

Typing  ability  was  assessed  using  a  b  minute  typing 
exercise.  The  exercise  consisted  of  two  standard  paragraphs 
totalling  21  lines.  Typing  ability  ranged  from  20  to  4U  wpm. 

The  actual  experiment  consisted  ot  20  basic  tots  commands. 
These  commands  totalled  Iby  utterances  and  involved  b7  of  the  ib^; 
utterances  deemed  necessary  to  conduct  an  actual  tots  war  game. 

The  20  commands  were  segmented  into  five  groups,  each  consisting 
of  tour  commands.  Subjects  input  the  20  commands  and  five  yroups 
of  tour  commands  in  each  ot  three  input  modes.  These  three  input 
modes  consisted  ot  buttered  voice,  unbuffered  voice  and  manual 
(typing).  Input  methods  were  randomly  assigned  in  terms  ot  order 
of  presentation. 

Performance  measures  were  as  follows: 

(1)  Time  required  tor  input 

(2)  [ npu t  error . 

Input  errors  involved  recognition  errors  and  operator  errors. 
An  error  involving  a  mis recoy n l t ion  by  the  TbOO ,  i.e.  utterance 
was  not  correctly  identified,  was  considered  a  recognition  error. 
This  form  ot  input  error  was  obviously  not  applicable  to  manual 
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entry.  Operator  error  was  essentially  any  other  error  that  cou  In 
not  be  classified  as  a  recognition  type  error. 

Results  ot  McSorley's  effort  suggested  that  manual  entry 
resulted  in  fewer  errors  and  taster  input  than  either  buffered  or 
unbuffered  voice  entry.  Under  unbuffered  voice  condition,  of  the 
67  utterances  required  to  form  the  command,  46  had  been 
mis recogni zed  at  least  once.  Twenty-one  of  the  utterances  were 
never  misrecognized .  In  terms  ot  total  errors,  manual  entry 
resulted  in  164  errors,  buttered  voice  542  ana  unbuffered  voice 
7(ji .  Therefore,  manual  entry  resulted  in  68.8  percent  fewer 
errors  than  buffered  and  ’’b.y  percent  fewer  errors  than 
unbuffered  voice. 

Speed  ot  entry  also  favored  manual  ;  put.  Total  time  tor 
input  using  manual  entry  was  254.  J5  minutes,  28b.  1 7  tor  buffered 
and  585.7  for  unbuffered.  Typing  was  therefore  i  1 . 1  and  5b. b 
percent  faster  than  buffered  and  unbuffered  voice  entry. 

Experience  was  a  definite  factor  in  time  required  for  entry. 
However,  while  unbuffered  voice  appeared  to  be  tiie  most 
dramatically  affected,  relative  position  did  riot  change. 

However,  experience  did  not  impact  or.  recognition  errors. 

Operator  errors  (e.g.  spelling  and  typing  errors  tor  manual  entry 
and  forgetting  procedures  in  voice  entry)  favored  buttered  voice 
entry  with  no  difference  between  manual  entry  and  unbuffered 
voice  . 

Results  ot  McSorley's  effort  seemingly  t  :ivor  a  manual  entry, 
particularly  over  unbuffered  voice  lh  the  experimental  task. 

There  are,  however,  several  considerations  which  require 
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clarification.  First,  error  measurement  was  not  the  same  for 
voice  and  manual  entry.  Voice  entry  total  errors  included  recog¬ 
nition  errors  and  operator  errors  whereas  manual  entry  assessment 
was  restricted  to  typing  errors/operator  errors.  The  impact  of 
this  difference  in  the  applied  sense  is  obviously  unknown.  It 
may  be  that  in  operational  setting  the  fact  that  error 
measurement  was  not  the  same  is  really  of  little  importance. 
However  it  may  also  be  that  the  difference  does  not  allow  the 
accurate  assessment  of  the  two  technigues  of  data  entry. 

It  may  be  that  voice  entry  is  simply  not  appropriate  tor 

tasks  of  the  type  examined  by  McSorley.  The  results  support  the 

requirement  to  examine  the  nature  of  the  task  prior  to  concluding 
/ 

that  application  is  or  is  not  appropriate. 

Kuess  (ly82)  studied  the  applicability  of  discrete 
utterance  voice  recognition  in  a  simulated  loading  and  retarget¬ 
ing  situation  for  Air  Launch  Cruise  Missiles  (ACLM).  As  in 
previous  efforts,  Kuess  compared  voice  entry  and  traditional 
manual  (keyboard)  entry  of  information  in  a  retargeting 
situation.  A  second  dimension,  although  not  directly  involved 
with  entry  method,  was  an  examination  of  display  techniques  and 
their  effect  on  performance. 

Kuess  measured  speed  of  input,  accuracy  of  input  and  time  vs 
accuracy  of  entry  methods.  For  keyboard  entry,  a  working  model 
of  the  integrated  keyboard  (IKH)  used  in  the  Hb<!G  was 
constructed.  The  integrated  keyboard  was  interfaced  with  an 
Apple  LI  microcomputer  via  a  Lear-Siegler  ADM  3A  terminal. 
Information  input  via  the  keyboard  was  stored  in  memory  for  later 


printout  on  a  MICKOLINE  Micron  8U  printer. 

Voice  entry  was  accomplished  usinj  the  TbUU  voice 
recognition  system.  In  the  Kuess  study  the  only  major  dilterence 
trom  previously  observed  i  nves t iga t ion  was  the  use  ot  the  Apple 
II  computer. 

Subjects  consisted  of  2U  volunteers.  Subject  population  was 
comprised  of  16  male  and  four  female  participants.  Seventeen  ot 
the  2U  were  military  and  three  were  civilians.  Five  ot  the 
subjects  had  previous  voice  entry  experience  and  no  subjects  had 
IKB  experience. 

In  the  case  ot  both  entry  methods,  subjects  were  allowed  a 
familiarization/training  period.  Voice  entry  training  was 
similar  to  previously  discussed  efforts. 

Experimental  design  involved  each  subject  entering  2U  hLCm 
target  sites.  Entry  mode  selected  was  determined  using  ABBA 
ordering  technique.  A  second  aspect  ot  the  study  involved  a 
"Target  Information"  verification.  Purpose  ot  this  portion  of 
the  experiment  was  to  compare  display  techniques.  Subjects  were 
required  to  make  changes  in  certain  target  sets  where  deliberate 
errors  had  been  introduced  by  the  investigator.  Certain  sets 
required  no  input  while  others  required  a  total  of  nine 
modifications.  .  Each  of  the  three  groups  ot  six  sets  required  a 
total  ot  26  randomly  distributed  changes. 

In  "Target  Loading  Task"  results  were  obtained  and  analyzed 
for  time,  accuracy  and  time  vs  accuracy.  In  this  task  keyboard 
entry  was  taster  tor  lb  ot  the  nineteen  subjects.  ot  the  tour 
remaining  subjects,  one  demonstrated  no  difference  between  entry 


modes  and  three  subjects  had  taster  entry  times  using  voice. 
Statistical  analysis  supported  that  I  KB  was  taster  than  voice 
entry  (P  <  .05).  These  results  were  similar  to  the  findings  ot 
McSorley  and  in  contlict  with  the  findings  of  Jay.  Overall,  IKB 
was  more  accurate  than  voice  with  a  P  <  .01. 

In  terms  of  output,  there  was  no  significant  difference 
between  voice  and  manual  entry. 

Overall,  Kuess '  experiment  suggested  manual  entry  was 
superior  to  voice  in  terms  of  speed  and  input  accuracy.  Output 
accuracy,  the  validation  ot  data  prior  to  actual  entry,  indicated 
no  difference  between  entry  methods. 

wolte  and  Taggert  (iy81)  compared  voice  and  manual  entry  in 
an  operational  data  entry  task.  Wolfe  and  Taggart  wrote  a 
computer  program  which  would  simulate  data  entry  capabilities  ot 
the  P-3C  operational  software. 

The  authors  attempted  to  simulate  an  operational  data  input 
function  analyzing  an  operational  vocabulary.  The  input  function 
selected  for  test  involved  the  TACCO  prefliyht  data  entry  in  the 
P-3C  ASW  patrol  aircraft.  The  actual  task  involved  entering 
prefliyht  data  in  Stores  Management  and  Navigation  Pretlight 
tableaux.  In  order  to  accomplish  the  task,  three  talbeaux  from 
the  operational  software  were  used  in  the  simulation.  These 
involved:  INDEX,  STOKES  MANAGEMENT,  and  NAV  PKEKEIGHT.  The  INDEX 

tableaux  represented  a  comprehensive  representation  ot  tableaux 
available  to  operators  in  the  operational  system.  The  INDEX 
tableaux  allows  operators  to  select  the  desired  tableaux,  in  the 


case  of  Wolte  and  Taggart's  experiment  either  STOKES  MAwAGEMEN 1 
or  NAV  PKEF  LIGHT ,  by  inserting  the  appropriate  command.  Once 
displayed,  operators  can  interact  with  tableaux  by  means  ot  a 
data  entry  system. 

Wolte  and  Taggart  were  interested  in  examining  whether  or 
not  voice  entry  held  any  advantages  over  the  traditional  manual 
entry  (keyboard)  system.  The  authors  selected  speed  ot  input  and 
accuracy  as  their  pertormance  measures. 

Equipment  included  the  T60U  voice  system,  a  uatamedia  Elite 
25UU  CRT  and  a  keyset.  '(’he  ettort  diltered  from  some  previous 
experiments  in  that  a  POP  115U  computer  was  used  in  the 
experiment.  Display  ana  entry  (keyset)  were  displaced  to  require 
subjects  to  divert  their  attention  away  from  entry  systems  as  is 
true  in  the  operational  environment. 

Thirteen  volunteers  served  as  subjects.  Subjects  included 
twelve  male  and  one  female  officer.  All  subjects  had  prior 
experience  with  keyboard  entry  with  a  wide  range  of  exposure 
levels.  Four  ot  the  subjects  were  TACCo’s  and  had  experience 
with  the  data  entry  task.  One  ot  the  thirteen  had  previous 
experience  with  voice  recognition  equipment. 

In  preparation  tor  participation,  subjects  were  administered 
a  typing  test  and  provided  with  information  regarding  use  ot 
voice  recognition  equipment.  Subjects  were  then  a L  lowed  to 
familiarize  themselves  with  voice  reognition  systems  and 
instructed  in  the  "training"  ot  the  system.  Subjects  trained  the 
system  on  the  61  vocabulary  utterances  following  familiarization. 
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A  departure  from  earlier  efforts  was  that  Wolfe  and  Tayyart  set 
the  criterion  tor  acceptance  of  traininy  at  three  out  of  tour 
correctly  recoynized  utterances,  rather  than  two  out  of  three. 

Stage  two  of  the  study  consisted  of  subjects  actually 
filling  in  the  data  required  tor  the  STORES  MANAGEMENT  ana 
NAV  PREFLIGHT  tableaux.  Subjects  had  to  insert  aata  twice,  once 
manually  and  once  using  voice.  Entry  method  sequence  was 
randomly  assigned  to  subjects.  Stage  two  followed  stage  one  by 
one  to  three  days. 

Stage  three  followed  stage  two  by  one  to  three  days.  This 
consisted  of  revising  the  entry  sequence.  That  is,  subjects  who 
started  with  voice  in  stage  two,  started  with  manual  during  stage 
three.  Those  who  started  with  manual  in  staye  two,  started  with 
voice  in  stage  three. 

As  suggested  earlier,  Wolfe  and  Tayyart  selected  speed  of 
input  and  accuracy  as  their  performance  measures.  Performance 
evaluation  was  based  on  operational  error  rate  and  time  required 
to  enter  data.  Operational  errors  were  defined  as  entry  errors 
which  went  undetected  by  subjects  and  therefore  remained 
following  completion  of  the  data  entry  task.  Input  errors  were 
recorded  but  only  for  consideration  in  analyzing  overall  input 
time. 

Tableaux  selected  for  study  (i.e.  STORES  MANAGEMENT  and 
NAV  PREFLIGHT)  were  used  as  a  result  of  different  input 
requirements.  STORES  MANAGEMENT  was  selected  because  one 
utterance  could  provide  more  than  one  bit  of  data  output.  This 
was  considered  to  be  the  most  advantageous  condition.  NAV 
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PREFLIGHT  was  considered  the  less  advantageous  situation  as  a 
result  ot  one  utterance  providing  one  bit  ot  data. 

As  all  subjects  participated  in  two  trials,  analysis  ot 
entry  time  examined  the  ettect  ot  trials  and  entry  method. 

Results  indicated  that  in  STORES  MANAGEMENT ,  voice  was  taster 
than  keyset  entry  in  both  trials.  It  was  interesting  that  keyset 
manifested  a  9.1  percent  improvement  in  time  between  trial  one 
and  two,  while  voice  entry  experienced  a  12.6  percent  improvement 
in  entry  time  between  trials  one  and  two  (P<.L)1).  However, 
statistical  analysis  revealed  no  significant  interaction  between 
entry  method  and  trial. 

In  Navigation  Preliight  data  entry  a  similar  situation  was 
demonstrated.  Keyset  data  entry  was  11.6  percent  taster  on  trial 
two  than  trial  one,  ana  voice  was  b.4  percent  taster  (Pc. lb). 
While  not  reaching  the  desired  level  ot  statistical  significance, 
the  findings  are  suggestive.  Comparison  of  entry  time  tor  the 
two  tableaux  revealed  that  in  STORES  MANAGEMENT  voice  was  9 . 7 
percent  taster  than  manual  ( P < . 1 ) .  Again  the  results  were  not 
significant  at  the  desired  level,  however,  results  do  suggest 
that  for  bTOKES  MANAGEMENT  voice  input  was  taster.  In  NAV 
PREFLIGHT  keyset  was  14. 1  percent  faster  than  voice  entry  with 
the  findings  statistically  significant  (Pc.ul). 

The  findings  support  Wolte  and  Taggart's  belief  that  task 
characteristics  may  be  a  contributing  factor  in  entry  speed 
performance.  That  is,  character  by  charcter  input  vs.  multiple 
character  may  dictate  which  method  ot  entry  is  superior.  Between 


subject  variability  in  entry  speed  perlorm ance  was  also 
mamtested  in  certain  aspects  ot  the  experiment.  Analysis 
indicated  a  ditterence  between  subjects  on  tbe  bTuKfcb  MANAGEMENT 
task  but  not  NAV  PREELIGHi.  Once  ayain  the  data  suyyests  that 
characteristics  ot  the  task  may  intluence  overall  pertormance. 

Experience  level  was  also  considered  as  a  possible  ractor 
in  entry  pertormance.  The  typiny  test  administered  prior  to  the 
actual  conduct  ot  the  experiment  was  used  to  seyreyate  subjects 
into  fast  typists  ( JU  wpm  or  yreater)  and  slow  typists  (less  than 
JO  wpm).  No  difference  in  entry  speed  was  observed  between  the 
two  yroups. 

Data  had  also  been  recorded  on  wartare  Speciality.  It  will 
be  recalled  that  four  ot  the  subjects  had  had  experience  with 
manual  entry  ot  the  type  of  data  used  in  the  study.  In  the  first 
no  difference  was  observed  between  the  "l'ACCO"  yroup  and  the  "non 
TACCO"  yroup.  However,  on  the  second  trial  the  "TACCu"  yroup  was 
2J  percent  taster  with  a  statistical  siyniticance  ot  PC.Ul. 

In  terms  ot  operational  errors,  (i.e.,  errors  which  remained 
at  completion  of  a  trial)  no  siyniticant  ditterence  was  observed 
between  the  entry  methods.  Unlike  entry  speed,  trials  did 
interact  with  per torrtidntv  . 

Entry  errors  were  not  a  prime  concern  ot  the  ettort. 

However,  entry  errors  were  considered  in  the  analysis  and 
differences  between  entry  modes  were  observed.  The  error  rate 
tor  voice  entry  was  2.4  percent  and  manual  entry  1.2  percent. 

The  difference  was  statistically  siyniticant  (Pt.U2b).  However, 
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it  should  he  noted  Unit  with  the  observed  percents  jos  a  sma  i  l 
slut  t  m  the  ads.!  lute  eft  or  rate  would  result  in  a  much  ,  router 
shit  t  in  the  rei.it  ive  rate.  i  tie  re  tore ,  the  t  inJiiijS  cue  Id  be 
mis  leading  it  strn-t  i ,  interpreted. 

Wolte  and  Tagger*  suggested  some  potential  reasons  i  or  voice 
entry  errors  in  their  presentations .  They  observed,  tor  example, 
that  voice  recognition  system  experienced  considerable  ditticulty 
with  the  operational  vocaou lary .  That  is,  in  the  vocabulary 
there  were  several  ut: era nee s  which  were  very  similar  (e.g., 
"thirteen  long",  "tit  teen  ring",  sixteen  long”,  etc.).  In  tact 
eight  utterances  or  li  percent  ot  the  total  resulted  in  1 1  .  v 
percent  ot  the  entry  errors  and  rive  percent  or  the  vocabulary  (3 
words)  accounted  tor  41.^  percent  ot  the  errors.  hli'.nnution  ot 
these  troublesome  utterances  would  probably  nave  improved  overall 
performance  ot  voice  entry  considerably .  Certainly  vocabulary 
selection  is  a  variable  demanding  attention  in  a  -'mpanson  ot 
entry  modes. 

An  interesting  observation  in  the  Wolte  ami  ia;  .art  ot  tort 
was  the  sampling  ot  subject  opinion  relative  t  •  entry  inodes .  The 
authors  had  subjects  respond  to  a  ques t  ionna i re  regarding  mode  ot 
entry  preference.  'twelve  ot  the  subjects  suggested  voice  was  the 
preferred  entry  mode  in  the  bTokts  MANAGtMKNT  task.  Keason  tor 
their  preference  indicating  freeing  the  eyes  tor  verit  lcation  of 
input  and  a  decrease  ot  fatigue  which  they  related  t>>  a  decrease 
in  the  probability  ot  producing  errors.  In  terms  ol  the  NAV 
kKttcIGHT  task  the  responses  were  generally  neutral  in  terms  ot 


entry  mode  pretereno 


The  overall  conclusion  of  the  effort  seems  to  favor  voice 
entry  for  the  STORES  MANAGEMENT  task  and  manual  entry  for 
NAV  PKEFLIGHT  task.  These  findings  are  significant  in  that  they 
suggest  a  possible  relationship  between  nature  of  the  task  and 
entry  mode.  This  very  important  question  needs  further  develop¬ 
ment.  The  tasks  examined  have  evolved  with  manual  entry 
considered  as  the  entry  method.  While  in  some  cases  slightly 
inferior  to  manual  entry,  voice  has  certainly  compared  favorably 
in  all  cases.  The  question  of  performance  effectiveness  if  the 
task  had  been  designed  with  voice  entry  in  mind  as  the  entry  mode 
needs  to  be  researched  before  a  firm  conclusion  of  superiority 
can  be  reached. 

The  results  of  studies  on  operational  type  tasks  suggest  a 
number  of  areas  in  which  further  research  is  required. 

Studies  should  be  developed  which  concentrate  on: 

( 1 )  nature  of  tasks 

(2)  experience  levels 

(3)  training  (e.g.,  criteria  for  suggesting  system  is 
"trained "  ) 

( 4 )  attitudes 

The  data  does  support  the  possibility  of  voice  entry  in  the 
operational  control.  Jay's  work  and  Wolfe  and  Taggart's  effort 
in  particular  suggest  the  value  of  voice  entry  in  certain 
environments/tasks . 

Personne 1 

Batchellor  (lysi)  was  concerned  with  the  potential  influence 
of  certain  personal  characteristics  on  voice  recognition  system 
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performances  .  She  considered  sex  (male  vs  t  e  ;u  !*•  ,  ,  >  •  !  I  io-rs  vs 
enlisted,  and  extent  of  training  (thr-  ,  five  or  ten  training 
trials ) . 

Batchellor  *s  study  used  essentially  t  :o  •  m«  ■  e;(u  indent  as 
previously  discussed  efforts.  Subjects  were  „nt  n.duo-!  to  the 
equipment  and  the  nature  of  the  experiment  explained.  Hollowing 
familiarization  actual  "training’'  of  the  syst<-.n  commenced.  one 
objective  of  the  effort  involved  cons  moral  i'  .  ‘  fie 

relationship  between  repetition  of  each  utterance  .mo 
performance.  That  is,  the  manufacturer  recommends  Id  training 
passes.  However,  when  using  10  trials,  training  can  be  extremely 
time  comsuminy.  If  essentially  the  same  n.  suits  can  be  obtained 
with  less  training,  a  considerable  time  savin;  could  be  realized, 
batchellor  investigated  performance  with  fare  ,  r  i  v..-  and  ten 
training  trials.  Order  of  training  passes  w  is  r  arid,  >m  i  zed  so  that 
each  (i.e.,  three,  five  ana  ten)  was  used  i 11  st  ana  witu  an  equal 
number  ot  trials.  Therefore  one-third  of  the  subjects  started 
with  three  training  passes,  one-third  started  wiMi  five  and 
one-third  started  with  ten.  batchellor  used  the  out  ot  s 
correct  recognition  as  her  criterion  for;  "trail.". 

Subjects  tor  the  study  consisted  <  •!  re;:  ;  ••male  ot  t  icers,  ten 
female  enlisted  personnel,  ten  male  ot  t  icet  . ,  arc.  ten  male 
enlisted  personnel.  Unlisted  subjects  were  stationed  at  the 
Naval  Postgraduate  school. 

All  but  two  ol  the  officer  subjects  were  students  at  NFS. 

The  remaining  two  consisted  ot  an  officer  stai  toned  at  Fort  Ord 


and  an  officer  stationed  at  .Joint  Chiefs  <>t  stall  . 


Vocabulary  used  by  Batchellor  consisted  of  50  utterances. 
Utterances  varied  in  length  from  one  to  five  syllables.  Criteria 
tor  selection  was  based  on  matching  the  number  of  utterances  in 
each  syllable  category  (i.e.,  having  an  equal  number  ot  two 
syllable  utterances  as  three  syllable,  etc.) 

Results  of  Batchellor 's  effort  indicated  that  sex  was  not  a 
major  factor  in  performance.  Machine  recognition  performance  was 
slightly  better  tor  men  (error  rate  of  1.8%)  than  tor  women 
(error  rate  of  2.1%).  This  difference,  however,  was  not 
statistically  significant.  These  results  indicate  that  voice 
characteristics  as  reflected  in  male-female  differences,  do  not 
represent  a  major  problem  in  system  performance. 

In  terms  of  the  relationship  between  system  performance  and 
rank,  enlisted  personnel  had  a  slightly  lower  mean  error 
percentage  (1.85%)  than  officer  subjects  (2.1)5%).  This 
difference  was  not  statistically  significant  and  one  can  conclude 
that  rank  in  and  ot  itself  did  not  represent  a  major  influence 
on  performance. 

The  relationship  between  training  trials  produced  some  very 
interesting  results.  Batchellor  observed  no  difference  between 
five  training  trials  and  ten  training  trials  (1%  error  tor  both 
rank  and  sex).  However,  even  though  a  significantly  greater 
number  of  errors  were  observed  with  three  training  trials,  the 
percentage  error  rate  was  still  only  three  percent. 

Interestingly  there  did  appear  to  be  a  relationship  between 
error  rates  and  rank.  Initial  indications  suggested  enlisted 
performance  was  superior  to  officer  with  the  reduced  (3  training 
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passes)  training  trials.  That  is,  there  did  appear  to  be  a 
significant  rank  by  number  of  training  passes  interaction.  The 
reason  for  this  interaction  is  unclear  and  it  is  possible  the 
results  were  spurious.  These  findings  do  suggest  the  need  to 
pursue  the  question  of  rank,  and  all  the  parameters  that  rank 
implies  in  relation  to  performance  of  the  system.  It  is  possible 
that  certain  characteristics  of  the  rank  structure  may  influence 
performance . 

The  interesting  finding  here,  however,  was  the  tact  that 
within  the  conditions  of  the  present  experiment,  little 
difference  was  observed  between  five  and  ten  training  trials. 
These  findings  could  be  of  major  importance  and  certainly  merit 
further  study.  For  example,  does  the  relationship  hold  under  an 
expanded  vocabulary?  Or  can  performance  be  maintained  with  fewer 
training  trials  providing  the  criteria  for  acceptance  is  made 
more  rigid? 

Neil  and  Andreason  (1981)  examined  the  bilingual  capability 
of  the  T6UU  speech  recognition  system.  That  is,  in  many  military 
situations  (e.g.  NATO  Command  and  Control  Center)  it  is  possible 
that  an  operator  may  be  required  to  interact  with  a  speech 
recognition  system  in  an  "official"  language  that  is  different 
than  his/her  "natural"  language.  tven  witn  the  user  that  is 
quite  proficient  and  fluent  with  the  "official"  language,  the 
potential  for  reversion  to  "natural"  language  may  be  considerable 
under  certain  circumstances. 

The  objective  of  Neil  and  Andreason 's  effort  was  to  examine 
the  ability  of  the  T6UU  to  recoynue  utterances  in  either 


language  when  training  had  occurred  in  both  languages. 

Essentially,  the  effort  was  designed  to  investigate  the  ability 
of  the  T600  to  function  in  a  bilingual  mode. 

Equipment  used  included  a  T6UU  voice  voice  recognition 
system  with  additional  memory  modules  which  expanded  its 
capability  to  2b6  .1  to  2  second  discrete  utterances.  In  the 

actual  experiment  only  lUb  discrete  utterances  were  used. 

Subjects  consisted  of  16  volunteers;  12  males  and  four 
females.  Male  subjects  were  West  German  officer  students  at  the 
Naval  Postgraduate  School.  Female  subjects  were  wives  of  German 
officer  students  at  NPS.  All  subjects  were  bilingual 
(German/English)  with  German  being  the  natural  language  in  all 
cases.  All  subjects  were  volunteers  and  received  no  compensation 
for  part icipat ion . 

A  1US  utterance  list  was  proposed  for  use  in  the  research 
effort.  Utterances  were  selected  on  the  basis  of  their  possible 
application  in  a  Command/Control  type  environment.  No  attempt  was 
made  to  control  for  syllable  count  in  either  language,  nor  was 
any  utterance  accepted  or  rejected  on  the  basis  of  its  potential 
tor  accuracy  in  recognition. 

The  procedure  required  that  each  subject  "train"  each 
utterance  three-  times.  Subjects  repeated  each  utterance  1U  times 
in  English  followed  by  testing  in  English;  trained  each 
utterance  1U  times  in  German,  followed  by  testing  in  German;  and 
repeated  each  utterance  5  times  in  English  and  S  times  in  German 
foLlowed  by  recognition  testing  in  English  and  German.  Actual 
order  of  training  and  testing  was  randomiEed  to  control 


tor  potential  interactions  between  traininy  sequence  and 
recognition  pertormance . 

Translation  ot  English  to  German  was  pertormed  by  one  ot  the 
experimenters.  This  was  done  to  reduce  variability  in  the  German 
list.  It  was  observed  that  without  such  control  considerable 
variability  in  translation  ot  Enylish  to  German  was  possible. 

Performance  measures  were  considered  in  terms  ot  recognition 
accuracy  under  training/testiny  conditions  described  earlier. 
Performance  measures  included  m isrecoy n i t ion  (i.e.,  incorrect 
recogniton)  and  non-recognition  (i.e.,  inability  ot  system  to 
match  test  utterances  with  any  trained  utterance).  Misrecoyni- 
tion  and  nonrecoyni t ion  were  both  considered  as  errors  and  were 
given  equal  weight  in  analysis. 

Design  of  the  experiment  was  of  a  repeated  measure  type  in 
which  each  subject  served  as  his  own  control  and  was  therefore 
tested  under  all  conditions.  The  design  selected  allowed  tor 
determination  of  training  ettect  variable  and  a  reduction  in 
variability  associated  with  individual  differences. 

In  addition,  as  a  result  ot  the  nature  ot  data  obtained,  the 
authors  analyzed  raw  data  and  arcsin  transformed  data.  Arcsin 
transformation  put  data  into  a  form  that  would  more  nearly 
satisfy  the  assumption  underlying  analysis  ot  variance. 

Analysis  of  both  raw  and  arcsin  transformed  data  supported  a 
highly  significant  training  language  effect.  When  t ra i ning/test- 
iny  occurred  with  a  single  language  (i.e.,  h'nyl  ish/Enyl  ish  or 
German/German)  no  difference  was  observed.  However,  when 
traininy  involved  both  languages  pertormance  was  significantly 
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degraded.  Further  analysis  revealed  that  neither  lanyuaye 
contributed  a  disproportionate  amount  to  performance  degradation. 

In  summary,  the  report  indicated  that  the  T6uu  could 
function  equally  well  in  either  of  the  two  lanyuayes  studied 
(Fnglish  or  German)  alone.  However,  when  required  to  perform  in 
a  bilingual  mode  the  variation  in  each  utterance  produced  such  a 
complex  array  that  the  T6UU  could  not  develop  a  satistactory 
reference  matrix  and  performance  was  severely  degraded. 

Therefore  the  study  by  Neil  and  Andreason  suggests  that  any 
situation  wherein  a  bilingual  situation  could  be  anticipated 
would  almost  certainly  result  in  a  reduction  in  recognition 
performance . 

In  any  operational  configuration  an  important  consideration 
in  voice  recognition  performance  is  time  and  vocabulary  size. 

That  is,  if  operators  were  required  to  "retrain"  the  system 
frequently  when  repeated  use  was  required  the  time  required  and 
inconvenience  created  could  seriously  degrade  the  overall 
effectiveness  and  useability  of  the  system.  Obviously  such  a 
situation  would  be  compounded  with  increased  vocabulary. 

Poock  (1981)  identified  this  potential  problem  area  and 
indicated  an  experiment  to  investigate  the  potential  tor 
performance  degradation  as  a  function  of  time  and  vocabulary 
size.  Subjects  initially  consisted  of  six  military  and  two 
civilian.  Two  of  the  subjects  were  female.  One  male  subject  was 
forced  to  withdraw  at  the  8th  week  leaving  a  total  of  7  subjects. 
Length  of  the  effort  was  Zi  weeks. 
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The  system  utilized  was  the  Threshold  Technology ,  inc. 

Model  TbUU  voice  recognition  system.  Subjects  observed  the 
recommended  traininy  sequence  (i.e.,  each  subject  repeated  each 
utterance  1U  times).  Vocabulary  consisted  ol  240  utterances. 
Following  completion  ot  training,  each  utterance  was  repeated 
three  times.  Criterion  tor  successtul  training  was  correct 
recognition  two  out  ot  three  passes.  In  the  event  the  utterance 
was  not  correctly  recognized  two  out  ot  three  times,  the 
utterance  was  retrained.  Once  criterion  was  reached,  training 
patterns  were  not  changed  during  the  remaining  20  weeks  ot  the 
ef  tort . 

In  addition,  two  subjects  (one  male  and  one  temale)  trained 
the  T600  in  a  "joint  mode".  in  "joint  mode"  the  two  subjects 
each  trained  each  utterance  b  times.  Same  criteria  tor  recogni¬ 
tion  sequence  was  adhered  to  and  remained  unchanged  tor  the 
following  20  weeks  ot  the  experiment. 

It  should  be  mentioned  six  ot  the  eight  subjects  had  minimal 
previous  exposure  to  the  T6UU  voice  recognition  system  (roughly 
one  month).  The  two  subjects  par t ic ipat i ng  in  the  "joint  mode" 
function  were  "experienced"  in  that  each  had  at  least  one  year  ot 
experience  with  the  system. 

For  the  experiment,  the  240  utterance  list  was  divided  into 
20  utterance  segments.  Fach  segment  consisted  ol:  two  1  syllable 
utterances,  six  2  syllable  utternaces,  tour  3  syllable  utterances 
and  tour  b+  syllable  utterances.  The  utterance  list  was  selected 
from  times  and  frequency  ot  use  exper iinents  in  a  Command  Center. 


,  •  t  i  .  >•  j  l  .  r  t-  it"juireii  sutijects  to  participate  each  week 
tot  ».•  *  •  or  inj  weekly  testiny  each  subject  repeated  each 

utteui  ■  *k'e.  The  procedure  involved  expanding  the  window  by 
20  utterance  segments.  That  is,  each  subject  was  tirst  tested 
only  on  utterances  U  -  Is.  Once  utterance  is*  was  repeated  the 
*  t •  w  *  .«r:  t  •  i  tic  1  ude  uterances  U  -  jy ,  followed  by  U 

• ,  •••  .  \i  .  *  >•  :  t  see  it  vocahu  lary  size  significantly 

■  ,  .-t  :  t  -  procedure  allowed  tor  examination  of 

put  r  :!iat  >•  i :»  ,  wit  ti  a  small  vocabulary  (20  utterances)  and 

extanoin,  ..ttetaiice  increments  up  to  and  including  the  full 

Z  4  0  u  1. 1  e  t  a  t  ic  e  .  L  S  t.  . 

Ine  two  subjects  selected  tor  the  "joint  mode"  performed  as 
above  as  well  as  providing  an  additional  4MU  repetitions  for 
examination  of  joint  reference  pattern  performance. 

At  the  completion  of  20  weeks  all  subjects  retrained  eacn 
utterance  which  had  been  misrecoynized  during  the  20  week  testing 
schedule.  Following  retraining  subjects  completed  testiny  for 
the  21st  week. 

Analysis  of  the  resuits  of  Foock's  longitudinal  effort  in¬ 
dicated  time  and  vocabulary  size  were  not  significant  factors  in 
system  performance.  As  expected  there  were  between  individual 
differences.  However,  the  results  indicated  no  significant  with¬ 
in  individual  differences  over  the  period  of  testing.  In  tact, 
over  the  21  week  testing  period  there  was  less  than  a  1.7%  varia¬ 
tion  in  recognition  performance.  The  suggestion  here  is  that 
reference  voice  patterns,  over  the  21  week  period  at  least,  re¬ 
mained  very  stable.  Further,  it  will  be  recalled  that  prior  to 
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the  21st  week  all  utterances  mis recogn i zed  during  the  previous  20 
weeks  were  retrained.  The  normal  expectation  would  be  a  signifi¬ 
cant  improvement  in  recognition  during  the  21st  week.  However, 
while  some  slight  improvement  was  indicated,  improvement  was  not 
statically  significant.  Observed  improvement  could  have  just  as 
easily  resulted  from  "end  spurt"  as  from  retraining. 

Vocabulary  size  did  not  significantly  ettect  recognition 
performance  either.  Voice  recognition  remained  relatively 
uniform  as  vocabulary  size  increased  with  statistical  analysis 
indicating  no  increases  in  error  rate.  There  was  an  indication 
that  error  rate  was  related  to  the  number  ot  syllables  in  an 
utterance.  Increasing  the  number  of  syllables  in  an  utterance 
resulted  in  decreased  recognition  performance.  The  suggestion 
here  is  that  vocabulary  size  may  not  be  a  factor  in  performance, 
but  that  structure  (syllable  count)  may. 

One  very  interesting  aspect  of  Poock's  effort  was  the  joint 
reference  pattern  investigation.  Performance  under  joint 
conditions  was  very  impressive.  In  tact,  performance  degradation 
was  .7%  when  compared  to  their  own  patterns.  The  male  subject's 
performance  was  superior  to  any  other  subject  using  their  own 
individual  reference  patterns. 

The  longitudinal  study  conducted  by  Poock  demonstrated  that 
performance  was  not  seriously  degraded  over  time.  The  observed 
stability  suggests  that  re-training  ot  voice  patterns  may  not  be 
necessary  with  prolonged  use.  further,  the  effort  certainly 
suggests  the  possiblity  of  joint  reference  patterns  at  least  tor 
critical  or  "stop  action"  inputs. 

ib 


one  potentially  disruptive  influence  in  voice  recognition 
is  the  concept  of  stress.  Armstrong  (1980)  in  a  comprehenisve 
examination  of  the  effects  of  workload  on  voice  recognition 
studied  the  problem  of  task-induced  stress  on  overall  system 
performance . 

As  in  previously  described  efforts,  Armstrong  employed  a 
T6UU  voice  recognition  system.  Vocabulary  consisted  ot  sU 
distinct  utterances.  Thirty  of  the  utterances  were  selected  from 
the  Modified  Rhyme  Test  which  is  commonly  used  in  the 
determination  of  speech  intelligibility  ot  communicat ion  systems. 
Sixteen  of  the  3U  words  actually  were  eight  pairs  of  rhyming 
words.  In  each  such  pair  the  ony  difference  between  words  was 
the  initial  consonant.  For  example,  the  words  "beat"  and  "peat" 
would  constitute  such  a  pair.  The  remaining  14  words  consisted 
of  seven  pairs  of  non-rhyming  but  similar  words  (e.g.,  "sap"  and 
"sat").  Twenty  of  the  50  utterances  were  selected  by  Armstrong 
from  single  words  commonly  used  in  Command  and  Control  environ¬ 
ments.  These  utterances  were  distinct  and  more  easily 
distinguishible  from  the  3U  utterances  selected  from  the  rhyming 
tes  t . 

All  words  used  were  either  one  or  two  syllables.  Selection 
was  actually  based  on  an  attempt  to  "confuse"  the  T6UU.  this 
intentional  contusion  was  attempted  to  demonstrate  a  decrement 
resulting  from  the  loading  task.  A  similar  objective  could  have 
been  satisfied  by  a  considerable  expansion  of  the  vocabulary. 

Such  an  expansion  would  have  required  a  considerable  increase  in 
testing  time  and  Armstrong  felt  the  same  objective  could  be 
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accomplished  by  means  ot  increasing  the  potential  tor  contusion. 
Therefore,  the  vocabulary  was  purposely  selected  to  increase  the 
likelihood  of  recognition  errors. 


Subject  loading  was  accomplished  through  the  use  of  a 
pursuit  tracker.  The  task  involved  tracking  a  . 7 b  inch  square 
light  target  travelling  in  a  clockwise  direction  at  a  constant  4U 
rpm  rate.  The  tracking  task  was  made  up  of  a  circular  tracking 
task  and  a  square-like  task.  Performance  was  based  on  time  on 
target . 

Procedure  consisted  ot  a  brief  orientation  followed  by 
f ami 1 iar i zat ion  with  equipment.  Subjects  then  "trained"  the  5U 
word  vocabulary.  The  two  out  ot  three  criterion  was  applied  for 
successful  training. 

Experimental  conditions  consisted  ot  three  levels  of  motor 
loading  (tracking)  and  the  voice  recognition  task.  In  one 
condition,  no  tracking  task  (NTT)  tnere  was  no  tracking 
requirement.  This  condition  assumed  no  motor  loadiny.  Subjects 
were  also  required  to  perform  the  circular  tracking  task  (CTT) 
and  the  square  like  tracking  task  (STT).  During  the  combined 
tracking  and  voice  recognition  pattern,  it  was  emphasized  that 
voice  was  the  primary  task.  Presentation  ot  tasks  was  presented 
in  different  orders  for  NTT,  CTT  and  STT,  thereby  controlling  for 
any  learning  or  ordering  effect. 

In  what  is  assumed  to  be  an  attempt  to  examine  the  effects 
of  time  on  task,  Armstrong  had  subjects  repeat  two  different 
consecutive  random  orderings  ot  vocabulary  words.  The 
first  time  through  the  vocabulary  was  considered  the  first  halt 
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ot  the  trial  and  the  second  pass  was  referred  to  as  the  second 
halt  of  the  trial.  Subjects  were  not  informed  as  to  when  they 
had  completed  halt  ot  a  trial. 

Armstrong  was  also  interested  in  the  possibility  that 
subjective  fatigue  may  influence  performance.  As  such,  each 
subject  was  administered  the  "Feeling  Tone”  Checklist  upon 
completion  of  each  condition  (Pearson  and  Byars,  19bb). 

Analysis  consisted  ot  an  examination  ot  (a)  recognition 
errors,  (b)  subject  verbal  errors,  (c)  influence  ot  subjective 
tatiyue  and  (d)  tracking  performance.  Recognition  errors  were 
defined  as  a  failure  of  the  T6U0  to  correctly  recognize  a 
vocabulary  word.  This  included  incorrect  recognition  and 
rejection  of  the  word  as  non  recognition.  Verbal  errors  were 
defined  as  the  failure  ot  a  subject  to  correctly  repeat  a 
presented  word.  As  suggested  earlier,  tracking  performance  was 
evaluated  in  terms  of  time  on  target.  That  is,  the  amount  of 
time  subjects  were  able  to  maintain  contact  with  the  rotating  buy 
with  their  wand.  Subjective  fatigue  was  evaluated  by  the  method 
suggested  by  Pearson  and  Byars  (l9bb). 

Results  ot  Armstrong's  effort  suggest  that  loading  the 
operator  did  influence  recognition  performance.  Specif ically , 
when  all  word  types  and  both  trial  halves  were  considered  the  NTT 
resulted  in  an  error  rate  ot  lU.bl%;  CTT  resulted  in  an  error 
rate  ot  14.43%,  and  STT  resulted  in  an  error  rate  ot  14.73%.  « y 

trial  halt,  including  all  word  types  and  loading  condition  the 
first  was  slightly  better  12.71%  to  13.'73%  tor  the  second  halt. 
Vocabulary  word,  overall  loading  conditions  and  both  trial 
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halves,  revealed  that  rhyminy  words  had  an  error  rate  ot  2b.  h'7^, 
non  rhyminy  at  12.91%  error  rate  and  operational  words  a  i. 48% 
error  rate.  Overall  error  rate  was  13.22%. 

Analysis  revealed  that  motor  loading  did  atiect  recognition 
performance.  The  difference  between  NTT  and  the  loading 
condition  of  CTT  and  STT  was  significant  at  P<.10.  brror  rate 
also  differed  by  vocabulary  word  type  (rhyming,  non-rhyming  but 
similar  and  operational.)  A  non-parainetr ic  analysis  technique 
indicated  that  pairwise  comparisons  ot  recognition  error  rate 
were  significant  at  PC. 01.  The  conclusion  here  being  that 
recognition  error  rates  for  each  ot  the  vocabulary  word  types 
were  different  from  each  other  word  type. 

Loading  also  influenced  operator  verbal  performance.  Not 
surprisingly,  with  increased  task  loading  a  subject's  ability  to 
repeat  the  stimulus  word  correctly  was  degraded.  Subjective 
fatigue  ws  not  found  to  be  a  significant  factor  in  overall 
performance . 

In  summary,  motor  loadiny  did  neyatively  affect  recognition 
and  verbal  performance  in  Armstrong's  effort.  The  nature  of  the 
secondary  tracking  task  was  such  that  successful  performance  was 
extremely  demanding.  It  was  actually  surprising  that  recognition 
performance  and  verbal  errors  did  not  suffer  greater  degradation. 
Pursuit  tracking  requires  continuous  effort  on  the  part  ot  a 
subject  and  is  an  excellent  source  ot  task-induced  stress. 
However,  it  is  difficult  to  imagine  a  real  world  uation  that 
would  require  the  same  level  of  sustained  attention  and  perfor¬ 
mance.  The  question  ot  realistic  motor  loading  and  its  affect  on 


performance  is  of  considerable  merit  and  should  be  pursued. 
Armstrong  has  shown  that  even  thouyh  recognition  performance 
suffered  as  a  result  of  motor  loading  (i.e.  error  rates  were 
rouyhly  1U  times  normal)  the  TbOO  performed  extremely  well  given 
the  nature  of  the  task.  In  tact,  when  one  considers  the  nature 
of  the  tracking  task  it  should  be  obvious  that  simultaneous 
manual  performance  would  be  difficult  if  possible  at  all. 
Therefore,  all  things  considered,  the  ability  of  subjects  and 
equipment  to  function  at  the  rather  high  levels  observed  suggests 
the  significance  of  Armstrong's  effort. 

In  a  followup  effort,  Armstrong  and  Poock  (19B1)  examined 
the  affects  of  mental  loading  on  recognition  performance.  The 
interest  was  directed  at  examining  the  potential  relationships 
between  increased  mental  load  (i.e.,  over  that  experienced  by 
subjects  during  training  of  the  TbUU )  and  performance  of  the 
voice  recognition  system.  As  in  Armstrong's  (19BU)  effort,  the 
assumption  was  that  load  may  result  in  altered  voice  character¬ 
istics  which  would  degrade  overall  system  ability.  The  Poock  and 
Armstrong  (1981)  effort  was  obviously  designed  to  augment  the 
work  of  Armstrong  (198U). 

The  voice  recognition  portion  of  Poock  and  Armstrong's  study 
was  essentially  the  same  as  earlier  effort  of  Armstrong. 
Vocabulary,  training,  equipment,  etc.  were  basically  the  same  tor 
the  two  studies. 

The  loading  portion  of  the  effort  was  accomplished  throuyh 
the  use  of  a  General  Dynamics  Response  Analysis  Tester  (RATKR). 
The  device  is  an  effective  instrument  tor  investigating  response 
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speed/accuracy  as  well  as  short  term  memory.  In  the  knock  and 
Armstrong  study  the  kA'I'kk  was  used  to  generate  and  display  random 
sequences  ot  tour  individual  symDols  (i.e.,  circle,  cross, 
diamond  and  trianyle).  Symbols  were  presented  at  a  constant  rate 
of  one  symbol  every  i.b  seconds.  kesponse  buttons  appropriately 
labeled  with  the  four  symbols  were  provided  to  subjects. 

An  interesting  feature  ot  the  KaThk  and  one  potentially 
valuable  tor  the  koock  and  Armstrong  study  is  the  ability  to 
program  "delay"  modes  into  the  system.  delay  modes  enable  the 
investigator  to  "delay"  the  proper  response  to  the  current 
stimulus.  In  other  words,  in  delay  mode  zero  the  proper  response 
is  the  currently  presented  stimulus.  In  delay  mode  one  the 
proper  response  is  the  response  button  labeled  with  the 
previously  displayed  stimulus  and  delay  mode  two  would  be  the 
symbol  which  had  appeared  two  trials  back,  etc. 

As  such,  in  delay  modes  a  subject  is  forced  to  recall 
stimuli  presented  one,  two,  three,  etc.  trials  previously  rather 
than  the  currently  displayed  information.  Such  a  system  is 
capable  ot  placiny  a  considerable  mental  load  on  subjects. 

In  the  Poock  and  Armstrong  effort  delay  condition  ot  zero, 
one  and  two  trials  back  as  well  as  no  mental  loading  were 
emp loyed . 

Subjects  consisted  of  24  volunteers.  Twenty-two  were  male 
U.S.  military  officer-students  at  the  Naval  Postgraduate  School. 

A  female  civilian  and  one  Canadian  military  otticer  completed 
the  subject  population.  Sixteen  ot  the  subjects  were  designated 
as  being  experienced  with  voice  recognition  equipment  (2-lu 
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hours)  and  eight  subjects  had  no  experience  with  voice 
recognition.  Two  of  the  14  had  had  a  briet  (1/2  hour)  experience 
with  the  KATfcK. 

To  reiterate,  the  hypothesis  was  that  increased  mental 
loading  would  result  in  changes  in  voice  characteristics 
sufficient  to  deyrade  the  recognition  ability  of  the  TbUU. 

System  performance  was  considered  in  terms  of  loading,  trial 
halt,  T6UU  experience  and  vocabulary  word  type.  Under  loading 
Poock  and  Armstrong  observed  that  with  no  kATtJR  loading  error 
rate  was  10.77%;  with  zero  delay  13.18%;  with  ae  lay  one  13.14% 
and  delay  two  13.60%.  The  suggestion  here  is  that  the  only 
difference  was  between  no  external  loading  and  the  three  loading 
conditions.  This  observation  was  confirmed  by  statistica' 
analysis  which  suggested  that  the  only  significant  difference 
existed  between  no  loading  (NKT)  and  the  remaining  three  loading 
conditions  zero  delay  (NDU)  delay  one  (KD1)  and  dela>  2  (KU2). 

Recognition  error  rate  was  also  observed  to  be  higher  during 
the  first  halt  of  testing  as  opposed  to  the  second  halt,  further, 
experience  levels  did  not  appear  to  influence  TbUU  performance 
and  there  were  no  significant  interactions. 

In  terms  of  subject  performance  as  contrasted  with  T6UU 
performance  the  indication  was  that  operator  loading  had  a 
significant  effect  on  subject  verbal  error  rate  and  experience 
level  was  also  a  significant  factor.  A  surprising  aspect  of 
experience  was  the  observation  that  "little  experience"  level 
subjects  had  a  higher  error  rate  than  "no  experience"  subjects. 

No  explanation  was  ottered  tor  this  observation  and  in  tact  these 


findings  may  be  spurious.  'this  observation  needs  turther  study 
to  examine  the  potential  reasons  tor  this  tindiny. 

In  general  Poock  and  Armstrong  confirmed  the  tollowing: 

(1)  Operator  mental  loading  attected  pertorinance 
(recognition  error  rates  were  23f,  greater  with  loading 
than  under  no  mental  loading  condition). 

(2)  Performance  appeared  to  be  sensitive  to  trial  half, 
(i.e.,  recognition  performance  during  the  first  2.b 
minutes  differed  from  the  second  2 . b  minutes). 

(3)  T6UU  recognition  errors  were  not  influenced  by 
experience  level.  This  tindiny  differs  from  the 
previous  observations  of  subject  performance  where 
initial  tindinys  suggested  an  experience  factor. 

In  summary,  Poock  and  Armstrong's  results  were  significant 
in  that  overall  performance  seems  to  be  affected  by  mental  loan¬ 
ing.  This  tinding  could  be  extremely  important  in  that  the  rela¬ 
tionship  between  mental  loadiny  and  performance  could  be  indica¬ 
tive  of  what  might  be  expected  in  operational  military  situa¬ 
tions.  Therefore,  the  area  merits  additional  research  to  deter¬ 
mine  the  extent  of  mental  loading  influence,  possible  relation¬ 
ships  between  nature  of  mental  loading  and  performance,  etc. 

t)qu  ipment 

Very  .ew  experimental  efforts  tor  voice  recognition  at  NPS 
have  dealt  with  equipment  modi t icat ion .  Most  studies  have  taken 


the  existing  system  and  investigated  it's  capability  under 
various  tasking  or  environmental  conditions. 


Schwaim  (1982)  recoynized  that  under  certain  operational 
military  environments  certain  equipment  mod i t icat ions  may  be 
required  for  satisfactory  functioning.  He  postulated  that  under 
conditions  where  several  operators  were  performing  a  task,  each 
using  a  separate  recognizer,  the  potential  for  contusion  could  be 
quite  high  and  recognition  error  potential  increased. 

Schwaim  suggested  that  one  method  tor  possibly  decreasing 
errors  in  such  a  multioperator  environment  would  be  the  addition 
of  a  mechanism  whereoy  speaker  commands  could  be  directed  to  the 
microphone  and  further,  provide  a  method  tor  reducing  the 
possibility  of  recognizible  sounds  or  utterances  to  be  released 
to  the  surrounding  environment.  He  suggested  the  addition  of  a 
"mask"  to  currently  available  systems  as  one  potential  method  tor 
improving  overall  system  performance  in  the  multioperator 
environment . 

The  expressed  objective  of  Schwalm's  experiment  was  to  ex¬ 
amine  the  accuracy  of  an  available  voice  recognition  system  with 
the  addition  of  a  "stenographer's"  mask  as  compared  to  the 
conventional  input  device. 

Initially  36  subjects  (32  males  and  4  females) 
participated  in  the  study.  However,  as  a  result  of  the  duration 
of  the  experiment  and  resultant  scheduling  problems,  data  was 
analyzed  from  18  subjects  (14  males  and  4  females). 

Equipment  consisted  of  two  T6UU  voice  recognition  systems. 
Both  systems  were  capable  of  handling  2bb  discrete  utterances. 
Three  input  methods  were  involved  in  .Schwaiin's  study.  f  irst,  a 


conventional  input  device  (SMlU  boom  microphone  mounted  on  a 
headset).  This  is  the  normal  input  device  tor  the  T6UU.  Second, 
a  stenographer's  mask  with  a  microphone  supplied  by  the 
manuf acturer .  The  third  input  system  consisted  of  a  stenomask 
fitted  with  SMlU  microphone. 

Training  of  the  speech  recognizer  was  the  standard  process 
of  lu  training  trials  per  utterance.  Testing  consisted  of  two 
passes  of  the  entire  vocabulary  on  each  of  three  successive  days. 
Therefore  six  testing  trials  were  run  for  each  subject  under  each 
of  the  mask  conditions. 

In  terms  of  total  errors  (misrecognition  and  non recogn i t ion ) 
there  was  a  significant  mask  effect.  Results  indicated  a 
significant  difference  between  no  masks  and  both  mask  conditions. 
No  difference  existed  between  the  mask  conditions. 

In  terms  of  nonrecogni t ion ,  no  significant  ditterences 
were  observed.  However,  for  misrecognition  a  significant 
difference  was  observed  between  no  mask  and  both  the  original  and 
the  Shure  mask. 

One  interesting  observation  was  the  tact  that  performance 
deteriorated  over  trials.  This  was  true  of  total  errors  and 
misrecogni t ions .  The  author  was  unable  to  attribute  these 
observations  to  any  specific  event  and  therefore  considered  these 
observations  to  be  spurious.  This  assumption  may  or  may  not  be 
valid  and  certainly  warrants  further  cons iderat ion . 

Schwalm  also  considered  the  potential  influence  of 
experience  with  masks  and  experience  with  microphones  on 


per tormance .  Subjects  were  divided  into  two  groups  (high 
experience  and  low  experience)  for  both  mask  and  microphone  use 


experience.  Results  suggested  that  mask  experience  was  a 
significant  variable  in  performance.  Differences  were  observed 
between  the  no  mask  condition  and  both  mask  conditions  in  the 
group  with  low  previous  experience.  In  the  high  experience 
group,  significant  differences  were  observed  between  the  no  mask 
condition  and  the  original  mask  condition. 

In  terms  of  microphone  experience  differences  were  observed 
between  the  no  mask  and  the  Shure  mask  condition  tor  the  low 
experience  group  and  between  no  mask  and  the  original  mask  tor 
the  hiyh  experience  group. 

Schwalm's  effort  is  significant  in  that  many  military 
environments  may  involve  the  use  ot  masks  in  the  operational 
setting.  That  tact  that  a  slight  (1.5  percent)  increase  in  errors 
between  no  mask  and  the  average  ot  the  two  masked  conditions 
suggests  a  potential  tor  performance  degradation  in  performance. 
Admittedly  the  degradation  observed  was  slight  (performance  with 
the  mask  was  94.7  percent  correct  recognition).  The  observation 
suggests  the  possibility  ot  certain  operational  environments 
introducing  perturbations  that  when  considered  in 
combination  with  other  factors  may  degrade  performance 
sufficiently  to  warrant  new  mask  design  configuration.  further, 
the  observation  that  experience  may  be  a  factor  in  performance 
suggests  the  possibility  ot  overcoming  any  degradation  through 
training.  future  efforts  might  consider  the  interaction  between 
mask  and  exper ience/t ra in  1 ng  in  a  simulated  operational  setting. 
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Summary 

The  research  efforts  on  voice  recognition  are  impressive  in 
suggesting  the  feasibility  and  potential  utility  of  voice  as  an 
input  mechanism  for  man-machine  systems.  Obviously  as  tor  any 
technological  advance  additional  research  is  suggested  by  the 
completed  work.  On  the  basis  of  current  findings  it  would  seem 
that  one  area  in  need  of  pursuit  is  the  possible  influence  ot 
task  specificity.  similar  task  types  occasionally  produced  some¬ 
what  contradictory  results,  yuestions  arise  as  a  result  ot  these 
observations  as  to  whether  the  observed  differences  were  the 
result  of  task  specific  conditions  or  were  they  the  result  ot 
subtle  experimental  design  questions? 

It  is  also  obvious  that  additional  work  on  the  potential 
influence  of  varied  environmental  factors  be  pursued.  The 
environments  explored  (e.y.  noise)  need  to  be  expanded  upon  and 
other  potential  physical  environmental  factors  (e.g.  vibration) 
need  to  be  explored.  In  addition,  it  may  be  that  various 
psycholog ica 1  environments  may  contribute  to  performance  and 
these  are  areas  ot  merit. 

One  area  not  considered  and  ot  potential  importance  is  the 
general  area  ot  acceptance  ot  the  system  by  users  and  managers. 
Mercherikott  and  Mackie  (iy7U)  suggested  that  operational 
military  personnel  frequently  tail  to  totally  accept  and 
occasionally  totally  reject  innovation  in  operational  equipment 


and  procedure.  This  resistance  to  change  is  not  unique  to  the 
military  and  in  fact  appears  to  be  a  universal  human 
characteristic.  In  the  military,  nonacceptance  ot  new 


equipment/technology  can  result  in  system  failure  or  system 
re  ject ion . 

Based  on  subjective  questioning  of  "users"  in  the  artificial 
laboratory  environment,  rejection  would  not  appear  to  be  a 
significant  problem.  It  must  be  realized  that  the  subjects 
involved  in  experimentation  are  not  really  users  in  the 
operational  sense  and  therefore  the  data  collected  may  well  be 
inapplicable.  This  area  ot  technology  acceptance  should  be 
considered  in  further  research  efforts. 

In  summary,  the  potential  of  using  speech  recognition  in  the 
military  environment  is  impressive.  Efforts  conducted  at  the 
Naval  Postgraduate  School  have  been  successful  in  suggesting  the 
variety  of  tasks  and  environments  in  which  speech  recognition  is 
an  effective  input  device.  Research  in  the  operational 
environment  would  appear  to  be  a  most  appropriate  next  phase  ot  a 
total  research  program.  Such  efforts  should  consititute 
"research"  as  opposed  to  "demonstrations",  however.  Attempts 
should  be  made  to  examine  the  utility  and  effectiveness  of  voice 
in  valid  operational  settings. 

Further,  the  entire  area  ot  acceptance,  fortunately  with  a 
system  as  novel  as  voice  recognition,  would  appear  to  be  ot 
considerable  merit.  For  even  if  the  system  is  effective  and 
offers  definite  advantages  over  more  traditional  systems,  such 
advantages  are  lost  it  management  and/or  the  individual  user  is 
unwilling  to  maximize  the  benefits  and  take  advantage  of  the 
power  ot  the  system. 
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obviously,  considerable  research  needs  to  be  accompl ished 


before  the  potential  and/or  the  limitations  oi  voice  recognition 
as  a  vehicle  for  human  input  to  machine  is  realized.  In  fact,  it 
one  considers  only  those  elements  (i.e.,  equipment,  environment, 
task  and  personnel)  which  have  been  suggested  as  determining  the 
efficiency  with  which  men  interact  with  machine,  it  is  obvious 
that  some  ot  the  elements  have  received  very  little  attention 
(e.g.,  environment).  individual  "elements"  need  additional 
pursuit,  as  well  as  possible  interaction  between  elements. 
Questions  such  as  task  specificity,  different  physical  environ¬ 
ments,  training,  etc.  need  to  be  addressed  in  future  research. 
However,  work  already  accomplished  is  certainly  suggestive  of  the 
potential  and  can  be  considered  as  indicating  that  voice  input  is 
a  reasonable  and  attractive  alternative  in  many  situations  cur¬ 
rently  employing  manual  entry.  Reliability  ot  the  system  has 
been  proven,  now  the  question  is  one  ot  specific  application. 
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